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Abstract. The Dempster-Shafer (DS) theory is a powerful tool for 
probabilistic reasoning based on a formal calculus for combining evi- 
dence. DS theory has been widely used in computer science and engi- 
neering applications, but has yet to reach the statistical mainstream, 
perhaps because the DS belief functions do not satisfy long-run fre- 
quency properties. Recently, two of the authors proposed an extension 
of DS, called the weak belief (WB) approach, that can incorporate de- 
sirable frequency properties into the DS framework by systematically 
enlarging the focal elements. The present paper reviews and extends 
this WB approach. We present a general description of WB in the 
context of inferential models, its interplay with the DS calculus, and 
the maximal belief solution. New applications of the WB method in 
two high-dimensional hypothesis testing problems are given. Simula- 
tions show that the WB procedures, suitably calibrated, perform well 
compared to popular classical methods. Most importantly, the WB ap- 
proach combines the probabilistic reasoning of DS with the desirable 
frequency properties of classical statistics. 

Key words and phrases: Bayesian, belief functions, fiducial argument, 
frequentist, hypothesis testing, inferential model, nonparametrics. 
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1. INTRODUCTION 

A statistical analysis often begins with an itera- 
tive process of model-building, an attempt to under- 
stand the observed data. The end result is what we 
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call a sampling model — a model that describes the 
data-generating mechanism — that depends on a set 
of unknown parameters. More formally, let X € X 
denote the observable data, and G T the parame- 
ter of interest. Suppose the sampling model X ~ Pe 
can be represented by a pair consisting of (i) an 
equation 

(1.1) A" = 0(9,10, 

where U £ U is called the auxiliary variable, and 
(ii) a probability measure fi defined on measurable 
subsets of U. We call (1.1) the a- equation, and \x 
the pivotal measure. This representation is similar 
to that of Fraser [11], and familiar in the context 
of random data generation, where a random draw 
U ~ fj, is mapped, via (1.1), to a variable X with the 
prescribed distribution depending on known 9. For 
example, to generate a random variable X having an 
exponential distribution with fixed rate 9 = 6, one 
might draw U ~ Unif (0, 1) and set X = —O^ 1 log U. 
For inference, uncertainty about 9 is typically de- 
rived directly from the sampling model, without any 
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additional considerations. But Fisher [10] highlighted 
the fundamental difference between sampling and 
inference, suggesting that the two problems should 
be, somehow, kept separate. Here we take a new ap- 
proach in which inference is not determined by the 
sampling model alone — a so-called inferential model 
is built to handle posterior uncertainty separately. 

Since the early 1900s, statisticians have strived 
for inferential methods capable of producing pos- 
terior probability-based conclusions with limited or 
no prior assumptions. In Section 2 we describe two 
major steps in this direction. The first major step, 
coming in the 1930s, was Fisher's fiducial argument, 
which uses a "pivotal quantity" to produce a poste- 
rior distribution with no prior assumptions on the 
parameter of interest. Limitations and inconsisten- 
cies of the fiducial argument have kept it from be- 
coming widely accepted. A second major step, made 
by Dempster in the 1960s, extended both Bayesian 
and fiducial inference. Dempster uses (1.1) to con- 
struct a probability model on a class of subsets of 
X x T such that conditioning on produces the sam- 
pling model, and conditioning on the observed data 
X generates a set of upper and lower posterior prob- 
abilities for the unknown parameter O. Dempster [6] 
argues that this uncertainty surrounding the exact 
posterior probability is not an inconvenience but, 
rather, an essential component of the analysis. In 
the 1970s, Shafer [18] extended Dempster's calculus 
of upper and lower probabilities into a general the- 
ory of evidence. Since then, the resulting Dempster- 
Shafer (DS) theory has been widely used in com- 
puter science and engineering applications but has 
yet to make a substantial impact in statistics. One 
possible explanation for this slow acceptance is the 
fact that the DS upper and lower probabilities are 
personal and do not satisfy the familiar long-run 
frequency properties under repeated sampling. 

Zhang and Liu [25] have recently proposed a vari- 
ation of DS inference that does have some of the de- 
sired frequency properties. The goal of the present 
paper is to review and extend the work of Zhang 
and Liu [25] on the theory of statistical inference 
with weak beliefs (WBs). The WB method starts 
with a belief function on X x T, but before condi- 
tioning on the observed data X, a weakening step is 
taken whereby the focal elements are sufficiently en- 
larged so that some desirable frequency properties 
are realized. The belief function is weakened only 
enough to achieve the desired properties. This is 
accomplished by choosing a "most efficient" belief 



function from those which are sufficiently weak — 
this belief is called the maximal belief (MB) solu- 
tion. 

To emphasize the main objective of WB, namely, 
modifying belief functions to obtain desirable fre- 
quency properties, we present a new concept here 
called an inferential model (IM). Simply put, an IM 
is a belief function that is bounded from above by 
the conventional DS posterior belief function. For 
the special case considered here, where the sampling 
model can be described by the a-equation (1.1) and 
the pivotal measure //, we consider IMs generated by 
using random sets to predict the unobserved value 
of the auxiliary variable U . 

The remainder of the paper is organized as fol- 
lows. Since WBs are built upon the DS framework, 
the necessary DS notation and concepts will be in- 
troduced in Section 2. Then, in Section 3, we de- 
scribe the new approach to prior-free posterior in- 
ference based on the idea of IMs. Zhang and Liu's 
WB method is used to construct an IM, completely 
within the belief function framework, and the desir- 
able frequency properties of the resulting MB solu- 
tion follow immediately from this construction. Sec- 
tions 4 and 5 give detailed WB analyses of two high- 
dimensional hypothesis testing problems, and com- 
pare the MB procedures in simulations to popular 
frequentists methods. Some concluding remarks are 
made in Section 6. 

2. FIDUCIAL AND DEMPSTER-SHAFER 
INFERENCE 

The goal of this section is to present the notation 
and concepts from DS theory that will be needed in 
the sequel. It is instructive, as well as of historical 
interest, however, to first discuss Fisher's fiducial 
argument. 

2.1 Fiducial Inference 

Consider the model described by the a-equation 
(1.1), where O is the parameter of interest, X is 
a sufficient statistic rather than the observed data, 
and U is the auxiliary variable, referred to as a piv- 
otal quantity in the fiducial context. A crucial as- 
sumption underlying the fiducial argument is that 
each one of (X, Q,U) is uniquely determined by (1.1) 
given the other two. The pivotal quantity U is as- 
sumed to have an a priori distribution fi, indepen- 
dent of 0. Prior to the experiment, X has a sam- 
pling distribution that depends on 0; after the ex- 
periment, however, X is no longer a random vari- 
able. To produce a posterior distribution for 0, the 



STATISTICAL INFERENCE WITH WEAK BELIEFS 



3 



variability in X prior to the experiment must some- 
how be transferred, after the experiment, to G. As 
in Dempster [1], we "continue to believe" that U is 
distributed according to \x after X is observed. This 
produces a distribution for Q, called the fiducial dis- 
tribution. 

Example 1. To see the fiducial argument in ac- 
tion, consider the problem of estimating the un- 
known mean of a N(Q, 1) population based on a 
single observation X . In this case, we may write the 
a-equation (1.1) as 

x = e + $~ 1 (io, 

where $(•) is the cumulative distribution function 
(CDF) of the -/V(0, 1) distribution, and the pivotal 
quantity U has a priori distribution [i = Unif(0, 1). 
Then, for a fixed 9, the fiducial probability of {G < 
9} is, as Fisher [9] reasoned, determined by the fol- 
lowing logical sequence: 

e<e x -<&~ 1 (u)<9 

U>${X-9). 

That is, since the events {0 < 9} and {U > $(X - 
9)} are equivalent, their probabilities must be the 
same; thus, the fiducial probability of {G < 9}, as 
determined by "continuing to believe," is &(9 — X). 
We can, therefore, conclude that the fiducial distri- 
bution of Q, given X, is 

(2.1) 0~iV(X,l). 

Note that (2.1) is exactly the objective Bayes answer 
when G has the Jeffreys (flat) prior. A more general 
result along these lines is given by Lindley [15]. 

For a detailed account of the development of 
Fisher's fiducial argument, criticisms of it, and a 
comprehensive list of references, see Zabell [24]. For 
more recent developments in fiducial inference, see 
Hannig [12]. 

2.2 Dempster-Shafer Inference 

The Dempster-Shafer theory is both a successor 
of Fisher's fiducial inference and a generalization 
of Bayesian inference. The foundations of DS have 
been laid out by Dempster [2-4, 6] and Shafer [18- 
22]. The DS theory has been influential in many 
scientific areas, such as computer science and engi- 
neering. In particular, DS has played a major role 
in the theoretical and practical development of ar- 
tificial intelligence. The 2008 volume Classic Works 



on the Dempster-Shafer Theory of Belief Functions 
[23], edited by R. Yager and L. Liu, contains a se- 
lection of nearly 30 influential papers on DS theory 
and applications. For some recent statistical appli- 
cations of DS theory, see Denoeux [7], Kohlas and 
Monney [13] and Edlefsen, Liu and Dempster [8]. 

DS inference, like Bayes, is designed to make prob- 
abilistic statements about G, but it does so in a very 
different way. The DS posterior distribution is not 
a probability distribution on the parameter space T 
in the usual (Bayesian) sense, but a distribution on 
a collection of subsets of T. The important point is 
that a specification of an a priori distribution for G 
is altogether avoided — the DS posterior comes from 
an a priori distribution over this collection of subsets 
of X x T and the DS calculus for combining evidence 
and conditioning on observed data. 

Recall the a-equation (1.1) where X G X is the 
observed data, G € T is the parameter of interest, 
and U S U is the auxiliary variable. In this setup, X, 
G and U are allowed to be vectors or even functions; 
the nonparametric problem where the parameter of 
interest is a CDF is discussed in Section 5. Here 
X is the full observed data and not necessarily a 
reduction to a sufficient statistic as in the fiducial 
context. Furthermore, unlike fiducial, the sets 

T x , u = {9eT:x = a{9,u)}, 

(2.2) 

U x ,e = {u € U : x = a(9, u)} 

are not required to be singletons. 

Following Shafer [18], the key elements of the DS 
analysis are the frame of discernment and belief func- 
tion; Dempster [6] calls these the state space model 
and the DS model, respectively. The frame of dis- 
cernment is X x T, the space of all possible pairs 
(X, G) of real-world quantities. The belief function 
Bel:2 M ^[0,l] is a set-function that assigns nu- 
merical values to events 6 C X x T, meant to repre- 
sent the "degree of belief" in £. Belief functions are 
generalizations of probability measures — see Shafer 
[18] for a full axiomatic development — and Shafer 
[20] shows that one can conveniently construct be- 
lief functions out of suitable measures and set-valued 
mappings through a "push-forward" operation. For 
our statistical inference problem, a particular con- 
struction comes to mind, which we now describe. 

Consider the set-valued mapping M:U— > 2 XxT 
given by 

(2.3) M(U) = {(X, 8)eXxT:I = a(G, U)}. 
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The set M(U) is called a focal element, and con- 
tains all those data-parameter pairs (X, 0) consis- 
tent with the model and particular choice of U. Let 
Jt = {M(U) : U € U} C 2 XxT denote the collection 
of all such focal elements. Then the mapping M(-) 
in (2.3) and the pivotal measure /i on B together 
specify a belief function 

(2.4) Bel(£) =fi{U:M(U) C £}, fcXxl 

Some important properties of belief functions will be 
described below. Here we point out that Bel in (2.4) 
is the push- forward measure //M _1 , and this defines 
a probability distribution over measurable subsets of 
' ■ Therefore, when U ~ \x, one can think of M{U) 
as a random set in ^# whose distribution is defined 
by Bel in (2.4). Random sets will appear again in 
Section 3. 

The rigorous DS calculus laid out in Shafer [18], 
and reformulated for statisticians in Dempster [6], 
makes the DS analysis very attractive. A key ele- 
ment of the DS theory is Dempster's rule of com- 
bination, which allows two (independent) pieces of 
evidence, represented as belief functions on the same 
frame of discernment, to be combined in a way that 
is similar to combining probabilities via a product 
measure. While the intuition behind Dempster's rule 
is quite simple, the general expression for the com- 
bined belief function is rather complicated and is, 
therefore, omitted; see Shafer [18], Chapter 3, or 
Yager and Liu [23], Chapter 1, for the details. But 
in a statistical context, the most important type of 
belief functions to be combined with Bel in (2.4) 
are those that fix the value of either the X or 
component — this type of combination is known as 
conditioning. It turns out that Dempster's rule of 
conditioning is fairly simple; see Theorem 3.6 of 
Shafer [18]. Next we outline the construction of these 
conditional belief functions, handling the two dis- 
tinct cases separately. 

Condition on Here we combine the belief func- 
tion (2.4) with another based on the information 
= 6. Start with the trivial (constant) set-valued 
mapping 

M o (U) = {(X,e):@ = 0}. 

This, together with the mapping M in (2.3), gives 
a combined focal element 

M (U) n M(U) = {{X, 9):X = a(9, U)}, 



the #-cross section of M(U), which we project down 
to the X-margin to give 

(2.5) M e (U) = {X:X = a(9,U)}(ZX. 

Let A be a measurable subset of X. It can be shown 
that the conditional belief function Belg can be ob- 
tained by applying the same rule as in (2.4) but with 
Mq(U) in place of M(U). That is, the conditional 
belief function, given = 9, is given by 



(2.6) 



Bel e (A) = n{U : M e (U) C A} 
= n{U:a(9,U)eA}, 



the push- forward measure defined by fi and the map- 
ping a(9, •), which is how the sampling distribution 
is defined. Therefore, given = 9, the conditional 
belief function Belg(-) is just the sampling distribu- 
tion P fl (-). 

Condition on X For given X = x, we proceed just 
as before; that is, start with the trivial (constant) 
set-valued mapping 

M {U) = {{X,O):X = x} 

and combine this with M{U) in (2.3) to obtain a 
new posterior focal element 

M (U) n M{U) = {{x, 0):x = a(0, U)}, 

the x-cross section of M(U), which we project down 
to the margin to give 

(2.7) M x (U) = {0:x = a(0,U)}cT. 

Unlike the "condition on 0" case above, this poste- 
rior focal element can, in general, be empty — a so- 
called conflict case. Dempster's rule of combination 
will effectively remove these conflict cases by condi- 
tioning on the event that Mx(U) ^ 0; see Demp- 
ster [3]. In this case, for an assertion, or hypothesis, 
A C T, the DS posterior belief function Bel x is de- 
fined as 



(2i 



Be\ x (A) 



y{U:M x (U)CA} 
V{U:M X (U)^0}' 



We now turn to some important properties of Bel^. 
In Shafer's axiomatic development, belief functions 
are nonadditive, which implies 

(2.9) Be\ x (A) + Bel x .(^ c ) < 1 for all A, 

with equality if and only if Bel^ is an ordinary addi- 
tive probability. The intuition here is that evidence 
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not in favor of A c need not be in favor of A. If we 
define the plausibility function as 

(2.10) Pl x (A) = l-Bel x (A c ), 
then it is immediately clear from (2.9) that 

Bel x {A)<Pl x (A) for all A. 

For this reason, Bel^^A) and Pl x -(^4) have often been 
called, respectively, the lower and upper probabili- 
ties of A given X = x. In our statistical context, A 
plays the role of a hypothesis about the unknown 
paramter of interest. So for any relevant asser- 
tion A, the posterior belief and plausibility functions 
Bel x (^4) and PLj(^4) can be calculated, and conclu- 
sions are reached based on the relative magnitudes 
of these quantities. 

We have been writing "X = x" to emphasize that 
the posterior focal elements and belief function are 
conditional on a fixed observed value x o£ X . But 
later we will consider sampling properties of the pos- 
terior belief function, for fixed A, function of 
the random variable X, so, henceforth, we will write 
M X {U) for M X (U) in (2.7), and Bel x for Bel x in 
(2.8). 

Example 2. Consider again the problem in Ex- 
ample 1 of making inference on the unknown mean 
G of a Gaussian population iV(Q, 1) based on a 
single observation X. We can use the a-equation 
X = Q + * _1 (C0, where U ~ fji = Unif(0, 1). The fo- 
cal elements M(U) in (2.3) are the lines 

M(U) = {(X,G):X = + <Z>~ 1 (U)}. 

Given X, the focal elements M X (U) = {X-* -1 (l7)} 
in (2.7) are singletons. Since U ~ Unif(0, 1), the pos- 
terior belief function 

Belx(^) = v{U : X - $-\U) e A} 

is the probability that an N(X, 1) distributed ran- 
dom variable falls in A, which is the same as the ob- 
jective Bayes and fiducial posterior. Note also that 
this approach is different from that suggested by 
Dempster [2] and described in detail in Dempster 
[5]. ' 

Example 3. Suppose that the binary data X = 
(Xi, . . . ,X n ) consists of independent Bernoulli ob- 
servations, and G € [0, 1] represents the unknown 
probability of success. Dempster [2] considered the 
sampling model determined by the a-equation 

(2.11) Xi = I {Ui < 8} , i = l,...,n, 



where I a denotes the indicator of the event A, and 
the auxiliary variable U = (Ui, . . . , U n ) has pivotal 
measure jj, = Unif ([0, l] n ). The belief function will 
have generic focal elements 

M{U) = {(X, Q):Xi = I {Ui < e} Vi = 1, . . . ,n}. 

This definition of the focal element is quite formal, 
but looking more carefully at the a-equation (2.11) 
casts more light on the relationships between Xj, Ui 
and Q. Indeed, we know that: 

• if Xi = 1, then Q > Ui, and 

• if Xj = 0, then Q<Uj. 

Letting Nx = Ya=i ^ be ^ ne number of successes 
in the n Bernoulli trials, it is clear that exactly Nx 
of the UiS are smaller than Q, and the remaining 
n — Nx are greater than Q. There is nothing par- 
ticularly important about the indices of the t/j's, so 
throwing out conflict cases reduces the problem from 
the binary vector X and uniform variates U to the 
success count TV = Nx and ordered uniform variates; 
see Dempster [2] for a detailed argument. Let Uu) 
denote the iih order statistic from U\, . . . , U n , with 
Ur \ := and Ur n +i) '■= L Then the focal element 
M(U) above reduces to 

M(U) = {(N, Q) : U [N) < G < U [N+1) }, 

U € [0, l] n . 

Figure 1 gives a graphical representation of this generic 
focal element. Now given N, the posterior belief 
function has focal elements 

(2.12) M N (U) = {@:U (N) <@<U (N+1) }, 

Ue[o,i] n , 

which are intervals (the horizontal lines in Figure 1) 
compared to the singletons in Example 2. Consider 
the assertion Aq = {G < 9} for 8 € [0, 1]. The poste- 
rior belief and plausibility functions for Aq are given 
by 

Bel N (Ae) = fJ,{U G [0, If : U {N+1) < 9}, 

Pl N (Ae) = l-fJ,{U£ [0, 1]" : U [N) > 9}. 

When iV is fixed, the marginal beta distributions 
of J7(7v) and C/fjv+i) are available and ~Belx(A$) and 
Plx(Ae) can be readily calculated. Plots for the case 
of n = 12 and observed N = 7 can be seen in Figure 3 
(Example 5 in Section 3.3). 

Next are two important remarks about the con- 
ventional DS analysis just described: 
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Fig. 1. A focal element M(U) for the Bernoulli data prob- 
lem in Example 3, with n — 7. A posterior focal element is a 
horizontal line segment, the O -interval determined by fixing 
the value of N = Nx ■ 

• The examples thus far have considered only "dull" 
assertions, such as A = {O < 8}, where conven- 
tional DS performs fairly well. But for "sharp" 
assertions, such as A = {O = 8}, particularly in 
high-dimensional problems, conventional DS can 
be too strong, resulting in plausibilities Plx{A) ~ 
that are of no practical use. 

• For fixed A, Belx(A) has no built-in long-run fre- 
quency properties as a function of X. Therefore, 
rules like "reject A if P\x(A) < 0.05 or, equiv- 
alently, if Belx(^4°) > 0.95" have no guaranteed 
long-run error rates, so designing statistical 
methodology around conventional DS may be chal- 
lenging. 

It turns out that both of these problems can be 
taken care of by shrinking Belx in (2.8). We do this 
in Section 3 by suitably weakening the conventional 
DS belief, replacing the pivotal measure \x with a 
belief function. 

3. INFERENCE WITH WEAK BELIEFS 
3.1 Inferential Models 

The conventional DS analysis of the previous sec- 
tion achieves the lofty goal of providing posterior 
probability-based inference without prior specifica- 
tion, but the difficulties mentioned at the end of 
Section 2.2 have kept DS from breaking into the sta- 
tistical mainstream. Our basic premise is that these 
obstacles can be overcome by relaxing the crucial 
"continue to believe" assumption. The concept of 
inferential models (IMs) will formalize this idea. 

Let Belx denote the posterior belief function (2.8) 
of the conventional DS analysis in Section 2.2, and 



let Bel* be another belief function on the parameter 
space T, possibly depending on X. For any asser- 
tion A of interest, Bel* (A) can be calculated and, 
at least in principle, used to make inference on the 
unknown O. We say that Bel* specifies an IM on T 
if 



(3.1) 



Bel* (A) < Bel x (A) for all A. 



Since Bel* has plausibility PI* (A) = 1 - Bel*(^4 c ), 
it is clear from (3.1) that PI* (A) > Pl X (A) for all 
A. Therefore, an IM can have meaningful nonzero 
plausibility even for sharp assertions. Shrinking the 
belief function can be done by suitably modifying 
the focal element mapping M(-) or the pivotal mea- 
sure but any other technique that generates a be- 
lief function bounded by Belx would also produce a 
valid IM. 

Belx itself specifies an IM, but is a very extreme 
case. At the opposite extreme is the vacuous belief 
function with Bel* (A) = for all A^ 2 T . Clearly, 
neither of these IMs would be fully satisfactory in 
general. The goal is to choose an IM that falls some- 
where in between these two extremes. 

In the next subsection we use IMs to motivate the 
method of weak beliefs, due to Zhang and Liu [25]. 
That is, we apply their WB method to construct a 
particular class of IMs and, in Section 3.4, we show 
how a particular IM can be chosen. 

3.2 Weak Beliefs 

Section 1 described how the a-equation might be 
used for data generation: fix 0, sample U from the 
pivotal measure /x, and compute X = a(G, U). Now, 
for the inference problem, suppose that the observed 
data X was, indeed, generated according to this 
recipe, but the corresponding values of and U 
remain hidden. Denote by U* the value of the un- 
observed auxiliary variable; see (3.2). The key point 
is that knowing © is equivalent to knowing U*; in 
other words, inference on is equivalent to predict- 
ing the value of the unobserved U* . Both the fiducial 
and DS theories are based on this idea of shifting 
the problem of inference on to one of predicting 
U*, although, to our knowledge, neither method has 
been described in this way before. The advantage of 
focusing on U* is that the a priori distribution for 
U* is fully specified by the sampling model. 

More formally, if the sampling model Pe is spec- 
ified by the a-equation (1.1), then the following re- 
lation must hold after X is observed: 



(3.2) 



X = a(9,U* 



STATISTICAL INFERENCE WITH WEAK BELIEFS 



7 



where is unknown and U* is unobserved. We can 
"solve" this equation for to get 



(3.3) 



G A(X, U* 



where A(-,-) is a set-valued map. Intuitively, (3.3) 
identifies those parameter values which are consis- 
tent with the observed X. For example, in the nor- 
mal mean problem of Example 1, once X has been 
observed, there is a one-to-one relationship between 
the unknown mean and the unobserved U* , that 
is, = A(X,U*) = {X - Q-^U*)}, so, given U* , 
one can immediately find 0. Therefore, if we could 
predict U* , then we could know exactly. The cru- 
cial "continue to believe" assumption of fiducial and 
DS says that U* can be predicted by taking draws 
U from the pivotal measure [i. WB weakens this as- 
sumption by replacing the draw U ~ jjL with a set 
S{U) containing U, which is equivalent to replacing 
H with a belief function. 

Recall from Section 2.2 that a measure and set- 
valued mapping together define a belief function. 
Here we fix \i to be the pivotal measure, and con- 
struct a belief function on U by choosing a set-valued 
mapping S : U — > 2 U that satisfies U € S(U). This is 
not the same as the DS analysis described in Section 
2.2; there the belief function was fully specified by 
the sampling model, but here we must make a sub- 
jective choice of S. We call this pair (/i,<S) a belief, 
as it generates a belief function fiS~ 1 on U. Intu- 
itively, (/j,,S) determines how aggressive we would 
like to be in predicting the unobserved U* ; more ag- 
gressive means smaller S(U), and vice versa. We will 
call S(U), as a function of U ~ a predictive ran- 
dom set (PRS), and we can think of the inference 
problem as trying to hit U* with the PRS S(U). 

The two extreme IMs — the DS posterior belief 



(sharp) assertion A will rarely, if ever, be hit by the 
focal elements Mx(U). 

In Section 3.4 we give a general WB framework, 
show how a particular S can be chosen, and estab- 
lish some desirable long-run frequency properties of 
the weakened posterior belief function. But first, in 
Section 3.3, we develop WB inference for given S 
and give some illustrative examples. 

3.3 Belief Functions and WB 

In this section we show how to incorporate WB 
into the DS analysis described in Section 2.2. Sup- 
pose that a map S is given. The case S(U) = {U} 
was taken care of in Section 2.2, so what follows will 
be familiar. But this formal development of the WB 
approach will highlight two interesting and impor- 
tant properties, consequences of Dempster's condi- 
tioning operation. 

Previously, we have taken the frame of discern- 
ment to be X x T. Here we have additional uncer- 
tainty about U* € U, so first we will extend this to 
the larger frame X x T x U. The belief function on 
U has focal elements 

{U* eV:U* £S{U)}, 

which correspond to cylinders in the larger frame, 
that is, 

{(X, 6, U*) : U* 6 S(U)}. 

Likewise, extend the focal elements M(U) in (2.3) 
to cylinders in the larger frame with focal elements 

{(X,Q,U*):X = a(@,U*)}. 

(The belief functions to which these extended focal 
elements correspond are implicitly formed by com- 
bining the particular belief function with the vacu- 
ous belief function on the opposite margin.) Com- 
bining these extended focal elements, and simulta- 



function Belx in (2.8) and the vacuous belief function^ieously marginalizing over U, gives a new focal ele- 



are special cases of this general framework; take 
S(U) = {U} for the former, and S(U) = U for the 
latter. So in this setting we see that the quality of 
the IM is determined by how well the PRS S(U) 
can predict U*. With this new interpretation, we 
can explain the comment at the end of Section 2.2 
about the quality of conventional DS for sharp asser- 
tions in high-dimensional problems. Generally, high- 
dimensional goes hand-in-hand with 
high-dimensional U, and accurate estimates of re- 
quire accurate prediction of U*. But the curse of di- 
mensionality states that, as the dimension increases, 
so too does the probabilistic distance between U* 
and a random point U in U. Consequently, the tiny 



ment on the original frame X x T, namely, 

M(U; S) = {(X, 0):X = a(0, u), u e S(U)} 

(3.4) 



\J{M(u):ueS(U)}, 



where M(-) is the focal mapping defined in (2.3). 
Immediately we see that the focal element M{U ;<S) 
in (3.4) is an expanded version of M(U) in (2.3). 
The measure \x and the mapping M(U;S) generate 
a new belief function over X x T: 

Bel(£; S) = n{U : M(U; S) C £}. 

Since M(U) C M(U;S) for all U, it is clear that 
Bel(£;5) < Bel(£). The two DS conditioning opera- 
tions will highlight the importance of this point. 
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Condition on Conditioning on a fixed = 0, 
the focal elements (as subsets of X) become 

M e {U;S) = {X : X = a(9,u),u e S(U)} 

= {J{M e (u):ueS(U)}. 

This generates a new (predictive) belief function 
Bele(-;«S) that satisfies 

Bel fl (.A; S) = fi{U : M e {U; S) C A} 

< n{U : M e (U) C A} 

= Bel e (A) = P e (A). 

Therefore, in the WB framework, this conditional 
belief function need not coincide with the sampling 
model as it does in the conventional DS context. 
But the sampling model Pg(-) is compatible with the 
belief function Berg (•;<$) in the sense that 

Bel fl (-;5)<P e (-)<Pl fl (-;5). 

If we think about probability as a precise measure of 
uncertainty, then, intuitively, when we weaken our 
measure of uncertainty about U* by replacing with 
a belief function we expect a similar smearing 

of our uncertainty about the value of X that will be 
ultimately observed. 

Condition on X Conditioning on the observed X, 
the focal elements (as subsets of T) become 

M x {U; S) = {@:X = a(0, u), u G S(U)} 

= {J{M x (u):ueS(U)}. 

Evidently, Mx(U ;S) is just an expanded version of 
Mx(U) in (2.7). But a larger focal element will be 
less likely to fall completely within A or A c . Indeed, 
the larger Mx (U ; S) generates a new posterior belief 
function Belx(s<5) which satisfies 



(3.5) 



Bel x (A; S) = fi{U: M x (U; S) C A} 

<fi{U:M x (U) CA} = Bel x {A) . 



Therefore, Belx(S'S) is a bonafide IM according to 
(3.1). 

There are many possible maps S that could be 
used. In the next two examples we utilize one rela- 
tively simple idea — using an interval/rectangle 
S(U) = [A(U),B(U)] to predict U*. 

Example 4. Consider again the normal mean 
problem in Example 1. The posterior belief function 
was derived in Example 2 and shown to be the same 



as the objective Bayes posterior. Here we consider a 
WB analysis where the set-valued mapping S = 5 W 
is given by 

(3.6) S{U) = [U-uU,U + u(l-U)] t 

w€ [0,1]. 

It is clear that the cases u = and uj = 1 corre- 
spond to the conventional and vacuous beliefs, re- 
spectively. Here we will work out the posterior be- 
lief function for u € (0, 1) and compare the result 
to that in Example 2. Recall that the posterior fo- 
cal elements in Example 2 were singletons Mx{U) = 
{0 : = X - (U)}. It is easy to check that the 
weakened posterior focal elements are intervals of 
the form 

M X {U; S) = \J{M x (u) : u G S(U)} 

= [X -^{U + uil-U)), 
X -$ _1 (17 -ujU)}. 

Consider the sequence of assertions A$ = {0 < 9}. 
We can derive analytical formulas for Belx(Ag) and 
Plx(Ae) as functions of 9: 



(3.7) 



Bel x (A e ;S) 



Plx(A e ;S) = l 



1-w 



1-U3 



where x + = max{0,x}. Plots of these functions are 
shown in Figure 2, for u> G {0,0.25,0.5}, when X = 
1.2 is observed. Here we see that as uj increases, the 
spread between the belief and plausibility curves in- 
creases. Therefore, one can interpret the parameter 
uj as a degree of weakening. 

Example 5. Consider again the Bernoulli prob- 
lem from Example 3. In this setup, the auxiliary 
variable U = (Ui,...,U n ) in U = [0,1]" is vector- 
valued. We apply a similar weakening principle as 
in Example 4, where we use a rectangle to predict 
U*. That is, fix uj G [0, 1] and define S = as 

S(U) = [A 1 (U),B 1 (U)]x---x[A n (U),B n (U)} 

a Cartesian product of intervals like that in Example 
4, where 

A i (U) = U i -ujU i , 
Bi(U) = Ui + u(l-Ui). 
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X = 1.2 




-2-10 1 2 3 



e 

Fig. 2. Plots of belief and plausibility, as functions of 8, for 
assertions Ae = {O < 9} for X = 1.2 and uj £ {0, 0.25, 0.5} in 
the normal mean problem in Example 4. The case uj = was 
considered in Example 1. 

Following the DS argument in Example 3, it is not 
difficult to check that the (weakened) posterior focal 
elements are of the form 

M N (U;S) = [f/ (7V) -ujU {n) ,U {n+1) 

+ u}(l-U {N+1) )}, 

an expanded version of the focal element Mx(U) 
in (2.12). Computation of the belief and plausibility 
can still be facilitated using the marginal beta distri- 
butions of £fyv) and tfyv+i). For example, consider 
the sequence of assertions Ag = {8 < 6}, 9 € [0, 1]. 
Plots of BelN^Ae; S) and PIn(Ao; S), as functions of 
8, are given in Figure 3 for uj = (which is the con- 
ventional belief situation in Example 3) and uj = 0.1, 
when n = 12 and N = 7. As expected, the distance 
between the belief and plausibility curves is greater 
for the latter case. But this naive construction of S 
is not the only approach; see Zhang and Liu [25] for 
a more efficient alternative based on a well-known 
relationship between the binomial and beta CDFs. 

3.4 The Method of Maximal Belief 

The WB analysis for a given set-valued map S 
was described in Section 3.3. But how should one 
choose S so that the posterior belief function sat- 
isfies certain desirable properties? Roughly speak- 
ing, the idea is to choose a map S with the "small- 
est" PRSs S(U) with the desired coverage probabil- 
ity. Following Zhang and Liu [25], we call this the 
method of maximal belief (MB). 

Consider a general class of beliefs 3$ = (fi,^), 
where \i is the pivotal measure from Section 1, and 
5? = {Su :w £ fi} is a class of set-valued mappings 



n = 12;N = 7 




0.0 0.2 0.4 0.6 0.8 1.0 



e 

Fig. 3. Plots of belief and plausibility, as functions of 6, 
for assertions Ag = {O < 9} when n — 12 and N = 7 and 
uj £ {0,0.1}, in the Bernoulli success probability problem in 
Example 5. The case uj — was considered in Example 3. 

indexed by Q. Each in 5? maps points mGD 
to subsets Su(u) C U and, together with the pivotal 
measure /i, determines a belief function ^S~ l on U 
and, in turn, a posterior belief function BeLv( - ; 5^) 
on T as in Section 3.3. For a given class of beliefs, 
it remains to choose a particular map or, equiv- 
alently, an index uj S O, with the appropriate credi- 
bility and efficiency properties. To this end, define 

(3.8) Q UJ (u)=fi{U:S UJ {U)^u}, ueU, 

which is the probability that the PRS S^(U) misses 
the target u G U. We want to choose in such a 
way that the random variable Qoj(U*), a function of 
U* ~ fi, is stochastically small. 

Definition 1. A belief (//, S u ) is credible at level 
ae (0,1) if 

(3.9) ^> a {uj):=^{U*:Q UJ {U*)>l-a}<a. 

Note the similarity between credibility and the 
control of Type-I error in the frequentist context of 
hypothesis testing. That is, if is credible at level 
a = 0.05, then in a sequence of 100 similar inference 
problems, each having different U* , we expect Q w — 
the probability that the PRS misses its target — 
to exceed 0.95 in no more than 5 of these cases. The 
analogy with frequentist hypothesis testing is made 
here only to offer a way of understanding credibility. 

It is not immediately clear why this notion of cred- 
ibility is meaningful for the problem of inference on 
the unknown parameter G. The following theorem, 
an extension of Theorem 3.1 in Zhang and Liu [25], 
gives conditions under which Belx (•;>$) has desir- 
able long-run frequency properties in repeated X- 
sampling. 
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Theorem 1. Suppose (/z,«S) is credible at level 
a£ (0,1), and that (jt{U : M X (U;S) / 0} = 1. Then, 
for any assertion A C T, the posterior belief function 
He\x(A;S) in (3.5), as a function of X , satisfies 

(3.10) P @ {Belx(A;S)>l-a}<a, Q&A C . 

We can again make a connection to frequentist 
hypothesis testing, but this time in terms of asser- 
tions/hypotheses A in the parameter space. If we 
adopt the decision rule "conclude @ £ A if 
Plx (A; S) < 0.05," then under the conditions of The- 
orem 1 we have 

P {Plx(^4; S) < 0.05} < 0.05, QeA. 

That is, if A does contain the true G, then we will 
"reject" A no more than 5% of the time in repeated 
experiments, which is analogous to Type-I error prob- 
abilities in the frequentist testing domain. So the 
importance of Theorem 1 is that it equates cred- 
ibility of the belief (pt,5) to long-run error rates 
of belief /plausibility function-based decision rules. 
For example, the belief (/i,<S w ) in (3.6) is credible 
for uj € [0.5,1], so decision rules based on (3.7) will 
have controlled error rates in the sense of (3.10). But 
remember that belief functions are posterior quan- 
tities that contain problem-specific evidence about 
the parameter of interest. 

Credibility cannot be the only criterion, however 
since the belief, with S{U) = U, is always credible at 
any level a £ (0, 1). As an analogy, a frequentist test 
with empty rejection region is certain to control the 
Type-I error, but is practically useless; the idea is 
to choose from those tests that control Type-I error 
one with the largest rejection region. In the present 
context, we want to choose from those a-credible 
maps the one that generates the "smallest" PRSs. 
A convenient way to quantify size of a PRS S W (U), 
without using the geometry of U, is to consider its 
coverage probability 1 — Q u . 

Definition 2. (/i,<S w ) is as efficient as (fj,,S u t) 

if 

tya{w) > (p a (oj) for all a G (0, 1). 

That is, the coverage probability 1 — Q u is (stochas- 
tically) no larger than the coverage probability 1 — 
Qui'- 

Efficiency defines a partial ordering on those be- 
liefs that are credible at level a. Then the level-a 
maximal belief (q-MB) is, in some sense, the max- 
imal (n,S u ) with respect to this partial ordering. 



The basic idea is to choose, from among those cred- 
ible beliefs, one which is most efficient. Toward this, 
let Q a C £1 index those maps which are credible 
at level a. 

Definition 3. For a e (0, 1), S u * defines an a- 
MB if 

(3.11) ip a (uj*) = sup (f a (uj). 

Such an uj* will be denoted by uj{a). 

By the definition of f2 Q , it is clear that the supre- 
mum on the right-hand side of (3.11) is bounded by 
a. Under fairly mild conditions on J^, we show in 
Appendix A.l that there exists an uj* € Q a such that 

(3.12) <p a (u*) = a, 

so, consequently, uj* = uj(a) specifies an a-MB. We 
will, henceforth, take (3.12) as our working defini- 
tion of MB. Uniqueness of a MB must be addressed 
case-by-case, but the left-hand side of (3.12) often 
has a certain monotonicity which can be used to 
show the solution is unique. 

We now turn to the important point of computing 
the MB or, equivalently, the solution uj(a) of the 
equation (3.12). For this purpose, we recommend the 
use of a stochastic approximation (SA) algorithm, 
due to Robbins and Monro [17]. Kushner and Yin 
[14] give a detailed theoretical account of SA, and 
Martin and Ghosh [16] give an overview and some 
recent statistical applications. 

Putting all the components together, we now sum- 
marize the four basic steps of a MB analysis: 

1. Form a class 88 = (/i,J^) of candidate beliefs, 
the choice of which may depend on (a) the as- 
sertions of interest, (b) the nature of your per- 
sonal uncertainty, and/or (c) intuition and geo- 
metric / computational simplicity. 

2. Choose the desired credibility level a. 

3. Employ a stochastic approximation algorithm to 
find an a-MB as determined by the solution of 
(3.12). 

4. Compute the posterior belief and plausibility func- 
tions via Monte Carlo integration by simulating 
the PRSs S u(a) (U). 

In Sections 4 and 5 we will describe several specific 
classes of beliefs and the corresponding PRSs. These 
examples certainly will not exhaust all of the possi- 
bilities; they do, however, shed light on the consid- 
erations to be taken into account when constructing 
a class 88 of beliefs. 
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4. HIGH-DIMENSIONAL TESTING 

A major focus of current statistical research is 
very-high-dimensional inference and, in particular, 
multiple testing. This is partly due to new scientific 
technologies, such as DNA microarrays and medi- 
cal imaging devices, that give experimenters access 
to enormous amounts of data. A typical problem 
is to make inference on an unknown G € M n based 
on an observed X ~ N n (@,I n ); for example, test- 
ing Ha : 0j = for each i = 1, . . . ,n. See Zhang and 
Liu [25] for a maximal belief solution of this many- 
normal-means problem. Below we consider a related 
problem — testing homogeneity of a Poisson process. 

Suppose we monitor a system over a pre-specified 
interval of time, say, [0, r]. During that period of 
time, we observe n events/arrivals at times = ro < 
t\ < T2 < ■ ■ ■ < r n , where the (n + l)st event, tak- 
ing place at r n+ i > t, is unobserved. Assume an 
exponential model for the inter-arrival times X{ = 
T{ — Ti-i, i = 1, ... ,n, that is, 



(4.2) 



(4.1) A^Exp(ei), i = l,...,n, 

where the X^s are independent and the exponential 
rates Oi, . . . , G n > are unknown. A question of in- 
terest is whether the underlying process is homoge- 
neous, that is, whether the rates Si, ... , G n have a 
common value. This question, or hypothesis, corre- 
sponds to the assertion 

A = {the process is homogeneous} 



{Qi = g 2 



e n }. 



Let (X, Q) be the real-world quantities of inter- 
est, where X = {X\ , X n ), G = (Gi , . . . , G n ) and 
X = T = (0, oo) n . Define the auxiliary variable U = 
(R, P), where R > and P = (Pi 
(n — l)-dimensional probability simplex P n _i C 
defined as 



, P n ) is in the 



n-l 



The variables R and P are functions of the data 
X\, . . . , X n and the parameters Qi , . . . , Q n . The a- 
equation X = a(G, U), in this case, is given by X{ = 
RPi/Oi, where 

n 

R = Y^ e i x 3 and 



(4.3) 



Pi 



GjAj 



1, 



,n. 



To complete the specification of the sampling model, 
we must choose the pivotal measure fi for the auxil- 
iary variable U = (R,P). Given the nature of these 
variates, a natural choice is the product measure 

(4.4) n = Gamma(n, 1) x Unif(P n _i). 

The measure fi in (4.4) is, indeed, consistent with 
the exponential model (4.1). To see this, note that 









Fig. 4. Six realizations of R-cross sections of the PRS Su,(R,P) in (4-5) in the case of n — 3. Here P2 is the triangular- 
region in the Barycentric coordinate system. 
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Unif(P n _i) is equivalent to the Dirichlet distribu- 
tion Dir(l n ), where l n is an n- vector of unity. Then, 
conditional on (0i,...,O n ), it follows from stan- 
dard properties of the Dirichlet distribution that 
@iXi, . . . ,@ n X n are i.i.d. Exp(l), which is equiv- 
alent to (4.1). 

We now proceed with the WB analysis. Step 1 is to 
define the class of mappings 5? for prediction of the 
unobserved auxiliary variables U* = (R*,P*). To ex- 
pand a random draw U = (R, P) ~ fx to a random 
set, consider the class of maps 5? = {S^ : uj € [0, oo]} 
defined as 



(4.5) 



5 w (l7) = {(r,p)€[0,oo)xP n _i: 

K(P, P )<oj}, 
where K(P,p) is the Kullback-Leibler (KL) diver- 



gence 



(4.6) K(P,p) = J2P i log(P i / Pi ), p,Pe. 



n-l- 



Several comments on the choice of PRSs (4.5) are 
in order. First, notice that S^^U) does not constrain 
the value of R, that is, S U (U) is just a cylinder in 
[0, oo) x P n _i defined by the P-component of U. 
Second, the use of the KL divergence in (4.5) is 
motivated by the correspondence between P n _i and 
the set of all probability measures on {l,2,...,n}. 
The KL divergence is a convenient tool for defin- 
ing neighborhoods in P n _i. Figure 4 shows cross- 
sections of several random sets S U (U) in the case of 
n = 3. 

After choosing a credibility level a 6 (0, 1), we are 
on to Step 3 of the analysis: finding an a-MB. As in 
Section 3, define 

Q u (r,p) = ii{(R,P):S u (R,P) $ (r,p)}, 

and, finally, choose co = oj(a) to solve the equation 

(i{(R*,P*) : Q W {R*,P*) >l-a} = a. 

This calculation requires stochastic approximation. 
For Step 4, first define the mapping P : T — > P n — i 

by the component-wise formula Pj(0) = 0jAj/ 
^2j&jXj, i = 1, . . . ,n. For inference on = (0i, 
. . . , n ), a posterior focal element is of the form 

M x (R,P;S uia) ) = {Q:K(P,P(Q))<u(a)}. 

For the homogeneity assertion A in (4.2) the pos- 
terior belief function is zero, but the plausibility is 
given by 

Pljf(.4;S w(a) ) 

= 1 - fi{(R, P) : K(P, P(l n )) > w(a)}, 



where Pj(l n ) = Aj/ Xj. Since P(l n ) is known 
and P ~ Unif(P n _i) is easy to simulate, once uj{pt) is 
available, the plausibility can be readily calculated 
using Monte Carlo. 

In order to assess the performance of the MB 
method above in testing homogeneity, we will com- 
pare it with the typical likelihood ratio (LR) test. 
Let £(Q) be the likelihood function under the general 
model (4.1). Then the LR test statistic for Hq : 0i = 
• • • = n is given by 

sup{l(e):eeff } 
sup{^(6) :@ € HqU Hq} 

0iT=i3) 1/n " 

X 

a power of the ratio of the geometric and arithmetic 
means. If P is as defined before, then a little algebra 
shows that 

L = -logLo =nK(u n ,P(l n )), 

where u n is the n-vector n _1 l n which corresponds to 
the uniform distribution on {1,2, ... ,n}. Note that 
this problem is invariant under the group of scale 
transformations, so the null distribution of P(l n ) 
and, hence L, is independent of the common value of 
the rates 0i, . . . , Q n . In fact, under the homogeneity 
assertion (4.2), P(l n ) ~ Unif(P n _i). 

Example 6. To compare the MB and LR tests 
of homogeneity described above, we performed a 
simulation. Take n = n\ + rii = 100, n\ of the rates 
01, ... , n to be 1 and 712 of the rates to be 9, for 
various values of 9. For each of 1000 simulated data 
sets, the plausibility for A in (4.2). To perform the 
hypothesis test using q, we choose a nominal 5% 
level and say "reject the homogeneity hypothesis 
if plausibility < 0.05." The power of the two tests 
are summarized in Figure 5, where we see that the 
MB test is noticeably better than the LR test. The 
MB test also controls the frequentist Type-I error at 
0.05. But note that, unlike the LR test, the MB test 
is based on a meaningful data-dependent measure of 
the amount of evidence supporting the homogeneity 
assertion. 

5. NONPARAMETRICS 

A fundamental problem in nonparametric infer- 
ence is the so-called one-sample test. Specifically, as- 
sume that X\, . . . ,X n are i.i.d. observations from a 
distribution on R with CDF F in a class F of CDFs; 
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the goal is to test Hq : F € Fo where Fo C F is given. 
One application is a test for normality, that is, where 
Fo = {iV(#,<7 2 ) for some 6 and a 2 }. This is an im- 
portant problem, since many popular methods in 
applied statistics, such as regression and analysis of 
variance, often require an approximate normal dis- 
tribution of the data, of residuals, etc. 

We restrict attention to the simple one-sample 
testing problem, where Fo = {-fo} C F is a single- 
ton. Our starting point is the a-equation 

(5.1) X i = F~ 1 (U i ) 1 Fe¥,i = l,...,n, 

where Ui, . . . ,U n are i.i.d. Unif (0, 1). Since F is mono- 
tonically increasing, it is sufficient to consider the or- 
dered data < Xr 2 ) < ■ • • < -XV n ) , the correspond- 
ing ordered auxiliary variables U = (t/(i), • • • , f7( n ))j 
and pivotal measure \x determined by the distribu- 
tion of U. 

In this section we present a slightly different form 
of WB analysis based on hierarchical PRSs. In hier- 
archical Bayesian analysis, a random prior is taken 
to add an additional layer of flexibility. The intu- 
ition here is similar, but we defer the discussion and 
technical details to Appendix A. 2. 

For predicting U* , we consider a class of beliefs in- 
dexed by f2 = [0, oo], whose PRSs are small n-boxes 
inside the unit n-box [0, l] n . Start with a fixed set- 
valued mapping that takes ordered n-vectors u € 
[0, l] n , points z € (0.5,1), and forms the intervals 
[Ai(z),Bi(z)], where 



(5.2) 



Ai(z) = qBeta(pj — zpi | i,n + 1 — i), 
Bi(z) = qBeta(pj + z(l — pi) \ i, n + 1 — i) 



and pi = pBeta(it(j) | i,n — i + 1). Here pBeta and 
qBeta denote CDF and inverse CDF of the Beta 
distribution, respectively. Then the mapping S(u, z) 
is just the Cartesian product of these n intervals; cf. 
Example 5. Now sample U and Z from a suitable 
distribution depending on uj: 

• Take a draw U of n ordered Unif (0, 1) variables. 

• Take V ~ Beta(w, 1) and set Z = \{l + V). 

The result is a random set S(U,Z) G 2 U . We call 
this approach "hierarchical" because one could first 
sample Z = z from the transformed beta distribu- 
tion indexed by w, fix the map S(-,z), and then 
sample U. 

For a draw (U,Z), the posterior focal elements 
for F look like 

M X (U, Z; S) = {F: A t (Z) < F(X (l) ) < B, t (Z), 

Mi = 1, . . . , n}. 

Details of the credibility of in a more general context 
are given in Appendix A. 2. Stochastic approxima- 
tion is used, as in Section 4, to optimize the choice 
of uj. The MB method uses the posterior focal ele- 
ments above, with optimal uj, to compute the poste- 
rior belief and plausibility functions for the assertion 
A = {F = F } of interest. 

Example 7. To illustrate the performance of 
the MB method, we present a small simulation study. 
We take F to be the CDF of a Unif (0, 1) distribu- 
tion. Samples X±, . . . ,X n , for various sample sizes n, 
are taken from several nonuniform distributions and 
the power of MB, along with some of the classical 
tests, is computed. We have chosen our nonuniform 



(50,50) 



(10,90) 



s 

o 




s 

o 

0- 




FlG. 5. Power of the MB and LR tests of homogeneity in Example 6, where 9 is the ratio of the rate for the last ni 
observations to the rate of the first m observations. Left: (711,712) = (50,50). Right: (711,712) = (10,90). 
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alternatives to be Beta,(fa, fa) f° r various values of 
(Pi, fa). For the MB test, we use the decision rule 
"reject Hq if plausibility < 0.05." Figure 6 shows the 
power of the level a = 0.05 Kolmogorov-Smirnov 
(KS), Anderson-Darling (AD), Cramer-von Mises 
(CV) and MB tests, as functions of the sample size 
n for six pairs of (fa, fa). From the plots we see that 
the MB test outperforms the three classical tests in 
terms of power in all cases, in particular, when n 
is relatively small and the alternative is symmetric 
and "close" to the null [i.e., when (fa, fa) ~ (1 , 1)] - 
Here, as in Example 6, the MB test also controls the 
Type-I error at level a = 0.05. 

6. DISCUSSION 

In this paper we have considered an modification 
of the DS theory in which some desired frequency 
properties can be realized while, at the same time, 
the essential components of DS inference, such as 
"don't know," remain intact. The WB method was 
justified within a more general framework of inferen- 
tial models, where posterior probability-based infer- 
ence with frequentist properties is the primary goal. 
In two high-dimensional hypothesis testing prob- 
lems, the MB method performs quite well compared 
to popular frequentist methods in terms of power — 
more work is needed to fully understand this rela- 
tionship between WB/MB hypothesis testing and 
frequentist power. Also, the detail in which these 
examples were presented should shed light on how 
MB can be applied in practice. 

One potential criticism of the WB method is the 
lack of uniqueness of the a-equations and PRS map- 
pings 5? '. At this stage, there are no optimality re- 
sults justifying any particular choices. Our approach 
thus far has been to consider relatively simple and 
intuitive ways of constructing PRSs, but further re- 
search is needed to define these optimality criteria 
and to design PRSs that satisfy these criteria. 

In addition to the applications shown above, pre- 
liminary results of WB methods in other statistical 
problems are quite promising. We hope that this 
work on WBs will inspire both applied and theoret- 
ical statisticians to take a another look at what DS 
has to offer. 

APPENDIX: TECHNICAL RESULTS 

A.l Existence of a MB 

Consider a class 5? = {S^ : uj € CI} of set- valued 
mappings. Assume that the index set Cl is a com- 



plete metric space. Each S u , together with the piv- 
otal measure fi, define a belief function iiS~ l on U. 
Here we show that there is a uj = uj(a) that solves 
the equation (3.12). To this end, we make the fol- 
lowing assumptions: 

Al. Both the conventional and vacuous beliefs are 

encoded in 5? . 
A2. If uj n — >• ijj, then S^ n (u) — > Su(u) for each u 6 U. 

Condition Al is to make sure that is suitably 
rich, while A2 imposes a sort of continuity on the 
sets S u € 5? . 

Proposition 1. Under assumptions A1-A2, 
there exists a solution uj(a) to (3.12) for any a € 
(0,1). 

Proof. For notational simplicity, we write Q(uj, 
u) for Qu(u). We start by showing Q(uj,u) is con- 
tinuous in cj. Choose u£S] and a sequence uj n — > uj. 
Then under A2 

Q(u n ,u) = J I{s Un {v)$u} dfJ.(v) 

-> J hsav)Ju} dfj,(v) = Q(cj,u) 

by the dominated convergence theorem (DCT). Since 
uj n — > uj was arbitrary and Cl is a metric space, it fol- 
lows that Q(-,u) is continuous on Cl. 

Write <p(u>) for (p a (uj) in (3.9); we will now show 
that ip(-) is continuous. Again choose uj € Cl and a 
sequence uj n uj. Define J u (u) = I{Q( u , u )>i-a}, so 
that tp(oj) = J Ju(u) dfj,(u). Since 

\(p(ui n ) - (p(u>)\ < j \J Un (u) - J u (u)\dfi(u) 

and the integrand on the right-hand side is bounded 
by 2, it follows, again follows by the DCT, that 
(p(ui n ) —> ip(ui) and, hence, that ip(-) is continuous 
on £1. But Al implies that (/?(■) takes values and 1 
on il so by the intermediate value theorem, for any 
a G (0,1), there exists a solution uj = uj(a) to the 
equation tp(uj) =a. □ 

A.2 Hierarchical PRSs 

In Section 5 we considered a WB analysis with 
hierarchical PRSs. The purpose of this generaliza- 
tion is to provide a more flexible choice of random 
sets for predicting the unobserved U* . Here we give 
a theoretical justification along the lines in Section 
3.4. 
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Fig. 6. Power comparison for the one-sample tests in Example 7 at level a = 0.05 for various values of n. The six alternatives 
are (a) Beta(0.8, 0.8); (b) Beta(1.3, 1.3); (c) Beta(0.6,0.6); (d) Beta(1.6, 1.6); (e) Beta(0.6,0.8); (f) Beta(1.3, 1.6) . 



Let oj £ Vt index a family of probability measures 
A w on a space Z, and suppose S(-, •) is a fixed set- 
valued mapping UxZ->2 u , assumed to satisfy U € 
S(U,Z) for all Z. A hierarchical PRS is defined 
by first taking Z ~ A w and then choosing the map 
=5(-,Z) defined on U. This amounts to a 
product pivotal measure /ix\j. Toward credibility 
of (/i x A u ,5), define the noncoverage probability 

Q u (u) = (ji x K){{U,Z):S{U,Z)^u} 
= j Q z {u)d\u{z), 



a mixture of the noncoverage probabilities in (3.8). 
Then we have the following, more general, definition 
of credibility. 

Definition 4. is credible at level a if 

lp a (u) :=^{U*:Q u (U*)>l-a}<a. 

Beliefs which are credible in the sense of Defini- 
tion 1 are also credible according to Definition 4 — 
take A w to be a point mass at u). It is also clear 
that if (fi,S z ) is credible in the sense of Definition 1 
for all z € Z, then (fj, x X W ,S) will also be credible. 
Next we generalize Theorem 1 to handle the case of 
hierarchical PRSs. 
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Theorem 2. Suppose that (/i x A^S) is credi- 
ble at level a in the sense of Definition 4, and that 
{H x X U) ){(U,Z):M X (U,Z;S) / 0} = 1. Then for 
any assertion AcT, the belief function BeLv(*4; S) = 
(fi x A w )5 _1 (A) satisfies 

P {Bel x (A; S) > 1 - a} < a, 9 G A c . 

Proof. Start by fixing Z = z, and write S z (-) = 
S(-,z). For O € A c , monotonicity of the belief func- 
tion gives 

Bel x (A;S z )<Bel x ({Q} c ;S z ) 

= fj,{U:M x (U;S z )jt&}. 

When is the true value, the event Mx(U ; S z ) ^ 
is equivalent to S Z {U) ^ U*; consequently, 

Be\ x {A;S z )<n{U:S z {U)jU*} = Q z {U*). 

For the hierarchical PRS, the belief function satisfies 

Bel x (A;S) = (/x x \,){{U, Z):M x (U, Z;S) C A} 

= J fi{U:M x (U;S z )CA}dXJz) 
= J Be\ x (A; S^dX^z) 
< J Qz(U*)dXUz) 

The claim now follows from credibility of the belief 
(/ixA w ,5). □ 
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